Solving Prediction Games with Parallel Batch Gradient Descent

نویسندگان

  • Michael Großhans
  • Tobias Scheffer
چکیده

Learning problems in which an adversary can perturb instances at application time can be modeled as games with datadependent cost functions. In an equilibrium point, the learner’s model parameters are the optimal reaction to the data generator’s perturbation, and vice versa. Finding an equilibrium point requires the solution of a difficult optimization problem for which both, the learner’s model parameters and the possible perturbations are free parameters. We study a perturbation model and derive optimization procedures that use a single iteration of batch-parallel gradient descent and a subsequent aggregation step, thus allowing for parallelization with minimal synchronization overhead.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

One Network to Solve Them All — Solving Linear Inverse Problems using Deep Projection Models

We now describe the architecture of the networks used in the paper. We use exponential linear unit (elu) [1] as activation function. We also use virtual batch normalization [6], where the reference batch size bref is equal to the batch size used for stochastic gradient descent. We weight the reference batch with bref bref+1 . We define some shorthands for the basic components used in the networks.

متن کامل

Parallel Dither and Dropout for Regularising Deep Neural Networks

Effective regularisation during training can mean the difference between success and failure for deep neural networks. Recently, dither has been suggested as alternative to dropout for regularisation during batch-averaged stochastic gradient descent (SGD). In this article, we show that these methods fail without batch averaging and we introduce a new, parallel regularisation method that may be ...

متن کامل

Parallel Collaborative Filtering for Streaming Data

We present a distributed stochastic gradient descent algorithm for performing low-rank matrix factorization on streaming data. Low-rank matrix factorization is often used as a technique for collaborative filtering. As opposed to recent algorithms that perform matrix factorization in parallel on a batch of training examples [4], our algorithm operates on a stream of incoming examples. We experim...

متن کامل

Accelerated Mini-Batch Stochastic Dual Coordinate Ascent

Stochastic dual coordinate ascent (SDCA) is an effective technique for solving regularized loss minimization problems in machine learning. This paper considers an extension of SDCA under the minibatch setting that is often used in practice. Our main contribution is to introduce an accelerated minibatch version of SDCA and prove a fast convergence rate for this method. We discuss an implementati...

متن کامل

Stochastic Proximal Gradient Descent with Acceleration Techniques

Proximal gradient descent (PGD) and stochastic proximal gradient descent (SPGD) are popular methods for solving regularized risk minimization problems in machine learning and statistics. In this paper, we propose and analyze an accelerated variant of these methods in the mini-batch setting. This method incorporates two acceleration techniques: one is Nesterov’s acceleration method, and the othe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015